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Abstract 


This work presented a novel approach to dynamic collision avoidance in mobile robots by integrating a hybrid 
Deep Deterministic Policy Gradient (DDPG) and Adaptive Neuro-fuzzy Inference Systems (ANFIS) algorithm. 
This combined approach aimed to enhance the robot's navigation capabilities in dynamic environments by 
leveraging the complementary strengths of both DDPG and ANFIS. The model achieved significant achievements, 
including a high efficiency score of 0.97012, a robustness rating of 1 (indicating no collisions during testing), 
consistent maintenance of a 0.2-meter safety distance, and a success rate of 97.8%. Additionally, the average 
completion time of 5.154 seconds demonstrated its real-time decision-making capability, making it suitable for 
time-sensitive applications. The proposed hybrid algorithm improved the robot's obstacle detection and decision- 
making abilities, leading to superior performance in dynamic obstacle avoidance scenarios. 


Keywords: Mobile robot, collision avoidance, Deep Deterministic Policy Gradient, Adaptive neuro-fuzzy 
inference systems, dynamic environments, autonomous navigation 


1. Introduction 


The rapid proliferation of robotics across diverse sectors, including manufacturing, healthcare, transportation, and 
space exploration, highlights the growing importance of autonomous systems. A fundamental challenge in 
developing such robots lies in their ability to navigate effectively and negotiate obstacles within complex and 
dynamic environments. Successful autonomous navigation and obstacle avoidance necessitate robust decision- 
making mechanisms capable of handling uncertainties and making timely and informed choices (van de Merwe, 
2024). 


Researchers have explored various methodologies to address this challenge, including machine learning 
paradigms, fuzzy logic, and control theory. Machine learning techniques, particularly deep reinforcement 
learning, have shown promise in enabling robots to learn from experiences and adapt their behaviors accordingly. 
Fuzzy logic, on the other hand, provides a sophisticated framework for reasoning under uncertainty and 
incorporating linguistic variables in decision-making processes (Karaduman, 2024). 


The integration of these diverse methodologies holds promise for enhancing autonomous robot navigation and 
obstacle avoidance. Notably, the hybridization of deep reinforcement learning with neuro-fuzzy systems, 
exemplified by the hybrid deep Q-learning-neuro-fuzzy network, empowers robots to navigate and circumvent 
obstacles in intricate environments. 


Within this hybrid framework, the deep Q-learning component leverages neural networks to approximate the Q- 
values (expected rewards) associated with the robot's actions across varying environmental states. This 
component learns optimal policies through experience. Conversely, the fuzzy logic component employs fuzzy sets 
and rules to address the inherent uncertainty and imprecision in sensor data and environmental conditions. This 
allows for flexible decision-making despite incomplete information (Brown & Zhang, 2019). 
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Several research efforts have investigated the efficacy of employing hybrid deep Q-learning-neuro-fuzzy 
networks for autonomous robot navigation and obstacle avoidance. For instance, Li et al. (2021) proposed a 
hybrid methodology integrating a neuro-fuzzy system with deep reinforcement learning techniques, yielding 
superior performance compared to conventional approaches across simulated and real-world scenarios. 


Despite the promising outcomes observed thus far, the application of hybrid deep Q-learning-neuro-fuzzy 
networks in autonomous robot navigation and obstacle avoidance remains an active area of research. Further 
investigations are warranted to comprehensively evaluate the effectiveness and robustness of this approach across 
diverse environmental contexts and operational conditions. 


In summary, the integration of deep reinforcement learning and fuzzy logic systems within the hybrid deep Q- 
learning-neuro-fuzzy network presents a promising avenue for enhancing autonomous robot navigation and 
obstacle avoidance capabilities within complex and dynamic environments. By synergistically harnessing the 
strengths of these methodologies, this approach enables robots to acquire adaptive behaviors and perform resilient 
decision-making, leading to safe and efficient operations. 


2. Related Works 


Cimurs et al. (2020) introduced Goal-oriented obstacle avoidance with DRL, effective in dynamic environments 
but challenged by data dependency, sample inefficiency, and computational demands, limiting scalability for real- 
world applications. 


Gao et al. (2020) presented Deep reinforcement learning for indoor mobile robot path planning, demonstrating 
efficient navigation but constrained by high data dependency and computational complexity, affecting robustness 
in unforeseen scenarios. 


Fan et al. (2020) presented Distributed DDPG, scalable for multi-robot collision avoidance with efficient inter- 
robot communication. Nevertheless, coordinated training for multiple robots is necessary, limiting its applicability 
to collaborative tasks in complex environments. 


Sangiovanni et al. (2020) introduced Deep Q-learning with a self-configuring path planning mechanism based on 
obstacle recognition, demonstrating its flexibility across diverse environments. However, its effectiveness relies 
heavily on significant training data. 


Liang et al. (2020) presented Deep Q-learning with multi-sensor fusion, facilitating real-time navigation in dense 
environments by integrating data from multiple sensors. Nevertheless, its practical application is limited by the 
high complexity associated with training, making it less suitable for robots requiring comprehensive 
environmental awareness. 


Patel et al. (2020) proposed Dynamically feasible DDPG, aiming to generate safe navigation paths while 
considering robot dynamics. While effective for robots with motion limitations and safety constraints, this method 
suffers from computational complexity and slow training convergence. 


Cheng et al. (2022) introduced Deep Q-learning tailored for nonholonomic robots, enabling path following and 
obstacle avoidance while considering kinematic constraints. However, challenges related to accurate robot 
modeling impact stability, especially in real-world navigation scenarios. 
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Cong (2023) proposed Q-learning for dynamic environment navigation, effectively combining path following 

with obstacle avoidance. However, limitations in performance were observed in complex environments, primarily 

suited for simpler robot configurations and controlled settings. 


Lu & Huang (2021) introduced autonomous navigation in uncertain environments based on DRL, effectively 
adapting to environmental changes but constrained by data dependency and computational complexity. 


Wenzel et al. (2021) presented vision-based obstacle avoidance with DRL, relying exclusively on visual data but 
constrained by data dependency and computational complexity. 


Choi et al. (2021) proposed Deep Q-learning for integrated path planning and obstacle avoidance, showcasing 
adaptability across diverse environmental settings. Nonetheless, customization for specific robot configurations is 
critical for optimal performance. 


Feng et al. (2021) introduced Deep Deterministic Policy Gradients (DDPG), offering rapid collision avoidance 
capabilities and compatibility with continuous action spaces. However, challenges such as hyperparameter 
sensitivity and generalization to various obstacle configurations remain unaddressed. 


Patel et al. (2021) introduced DWA-RL for efficient navigation in dynamic crowds, achieving high collision 
avoidance rates. However, computational overheads, particularly in dense pedestrian environments, pose 
significant challenges. 


Song et al. (2021) proposed multimodal DRL with auxiliary tasks for indoor mobile robot obstacle avoidance, 
demonstrating robustness but facing challenges in data integration and computational complexity. 


Li et al. (2021) proposed a behavior-based navigation method combining deep reinforcement learning with rule- 
based strategies. Despite its robustness, challenges pertaining to data dependency and computational complexity 
persist, especially concerning the balance between learning and rule-based approaches. 


Almazrouei et al. (2023) proposed Deep Q-learning with prioritized experience replay, aiming to facilitate 
efficient learning for dynamic obstacle avoidance. Nevertheless, addressing issues like overfitting with limited 
data remains a critical concern. 


Cong (2023) proposed path following and obstacle avoidance using reinforcement learning, effective in dynamic 
environments but limited by data dependency and computational complexity. 


The reviewed studies highlight the potential of DRL and hybrid DRL methodologies in enhancing mobile robot 
navigation and obstacle avoidance capabilities. These approaches facilitate adaptive learning and improved 
navigation accuracy in challenging environments, albeit with notable limitations regarding data dependency and 
computational demands. Addressing these limitations and exploring new research avenues that integrate DRL 
with other techniques, such as multi-modal sensing and domain adaptation, hold promise for further 
advancements in robust mobile robot navigation and obstacle avoidance systems. 


3. Methodology 


This study presents a novel research methodology for tackling dynamic obstacle avoidance in mobile robots. It 
combines the strengths of Deep Deterministic Policy Gradient (DDPG) and Adaptive Neuro-fuzzy Inference 
Systems (ANFIS) strategies to create a robust and efficient system. The method integrates three key components: 
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e Obstacle Avoidance via Pareto Optimization: This component identifies and prioritizes feasible 
navigation paths while considering multiple objectives, such as safety, efficiency, and energy 
consumption. Pareto optimization ensures a balance between these objectives, leading to robust and 
adaptable decision-making. 

e Decision-Making Using DDPG: DDPG, a model-free reinforcement learning algorithm, utilizes actor- 
critic techniques and deep neural networks to handle continuous action spaces. This allows the robot to 
learn optimal control policies through iterative refinement based on its interaction with the environment. 

e Sensor Data Processing through Adaptive Neuro-Fuzzy Inference System (ANFIS): ANFIS provides a 
flexible and computationally efficient framework for processing sensor data and extracting relevant 
information for obstacle detection and characterization. This information is then fed into the decision- 
making module. 


By leveraging the synergy between these components, the hybrid system empowers robots to navigate safely and 
efficiently in complex and dynamic environments.Effective navigation necessitates the ability to handle long-term 
dependencies and dynamic changes in sensor data. FS-LSTM offers a computationally efficient alternative to 
traditional LSTMs, effectively managing these challenges. It processes sensor data, extracting relevant features 
and temporal dependencies, which are then fed into the DDPG algorithm.Within the integrated system, the DDPG 
algorithm utilizes the actor and critic networks. The actor network generates control actions based on the 
processed sensor data, while the critic network evaluates their effectiveness. Both networks are continuously 
trained through backpropagation and gradient descent to minimize the difference between expected and actual 
rewards, leading to optimal policy refinement over time.During operation, the robot continuously senses its 
surroundings, feeding data into the FS-LSTM and DDPG algorithm. This enables the system to generate optimal 
control actions for real-time dynamic obstacle avoidance. This approach has demonstrated superior performance 
compared to traditional methods in both simulated and real-world experiments. 


This research builds upon the initial hybrid approach by incorporating ANFIS, Pareto optimization, and improved 
FS-Deep Deterministic Policy Gradient (FS-DDPG) to further enhance performance, robustness, and scalability. 
This addresses various limitations of existing methods, including: 
i. Sensor constraints: ANFIS facilitates efficient sensor data processing and extraction of relevant features, 
even with limited sensor capabilities. 
ii. | Computational demands: FS-DDPG offers improved computational efficiency compared to traditional 
DDPG, making it suitable for real-time applications. 
iii. | Limited predictability: Pareto optimization allows for balancing conflicting objectives and adapting to 
dynamic environments with limited predictability. 
iv. | Over-reliance on training data: The improved FS-DDPG architecture incorporates techniques to reduce 
dependence on extensive training data. 
v. Multi-objective optimization challenges: Pareto optimization provides a robust framework for handling 
multi-objective decision-making in dynamic environments. 


Evaluation and Expected Outcomes: 
The comprehensive solution aims to achieve higher success rates in obstacle avoidance, particularly in scenarios 
with dense and dynamic obstacles. Evaluation will be conducted using the University of Michigan North Campus 
Long-Term Vision and LIDAR Dataset. The expected outcomes include: 

i. Improved obstacle avoidance performance compared to existing methods. 
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ii. | Enhanced robustness and adaptability to dynamic and unpredictable environments. 
iii. Increased scalability for real-world applications with resource constraints. 


Decision making module 


rá Actor Network wN 
( fae i \ 
obstacle J | 
avoidance | a 
Saor module | Batch | 
E SA TN < = aa. | Normalizatio = tines | | 
Input | o | | l Parato l | | 
Sensor —| | ANFIS | | Il optimization | T 
Data > Li mt, ©} fren | | 
— a =” | 
l ) 
\ 


Figure 2: Segway robotic platform with Sensors outfit 
(Source: Carlevaris-Bianco, Ushani & Eustice ,2016). 


Figure 2 shows the Segway mobile robot outfitted with an RTK (1) GPS omni-directional camera (2), 3D lidar 
(3), IMU (4), consumer-grade GPS (5), 1-axis FOG (6), 2D lidars (7), and CPU (8 


Page 20 
© The Author(s), under exclusive license to KJCS 2024 


Online ISSN: 
Print ISSN: 
www.kjcs.com.ng 


KiJICES 


Kasu Journal of Computer Science Vol.1 No.1 [March, 2024], pp. 16-27 
https: //doi.or: 


% a = z aria š bre- r 
Figure 3: Sample trajectory from one session of data collection, overlaid on satellite imagery 
(Source: Carlevaris-Bianco, Ushani & Eustice ,2016). 


This study employs a comprehensive set of criteria to evaluate the performance of mobile robots in dynamic 
obstacle avoidance scenarios. These criteria encompass various aspects of the robot's behavior and the 
effectiveness of the implemented algorithm. 


i. Success Rate: This primary metric measures the proportion of successful obstacle avoidance maneuvers relative 
to the total number of attempts. It quantifies the overall effectiveness of the robot's strategy in navigating dynamic 
environments. 


ii. Completion Time: This parameter assesses the duration taken by the robot to execute the obstacle avoidance 
maneuver. It captures the speed and efficiency of the robot's response to detected obstacles, measured from the 
initial detection to the resumption of its intended path. 


iii. Distance to Obstacle: This metric indicates the minimum distance maintained by the robot from the obstacle 
throughout the avoidance maneuver. It reflects the robot's ability to navigate safely while minimizing the risk of 
collision. 


iv. Safety Distance: This criterion defines the minimum acceptable distance between the robot and the obstacle to 
ensure safe navigation. It establishes a threshold for the robot's proximity to obstacles, beyond which potential 
collisions could occur. 


v. Motion Smoothness: This evaluation assesses the fluidity of the robot's movement during the avoidance 
maneuver. It focuses on the absence of abrupt changes or jerky motions in velocity, ensuring smooth and 
controlled navigation. 


vi. Computational Efficiency: This parameter evaluates the time and computational resources consumed by the 
obstacle avoidance algorithm to generate a solution. It is determined by the ratio of the total number of tasks 
executed by the algorithm to the overall time taken for their execution. This metric reflects the algorithm's 
scalability and suitability for real-time applications. 
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3. Result and Discussions 
The performance of the Hybrid DDPG-ANFIS model is shown in Table 1. The table presents the results and 
evaluation metrics of the Hybrid DDPG-ANFIS model. 


Table 1: Parameters of the Hybrid DDPG-ANFIS model 


Parameters Values 
LearnRate 0.001 
L2RegularizationFactor 0.0001 
GradientThreshold 1 
SampleTime 0.1 
TargetS moothFactor 0.001 
DiscountFactor 0.995 
MiniBatchSize 128 
ExperienceBufferLength 1000000 
NoiseOptions. Variance 0.1 
NoiseOptions. VarianceDecayRate 0.00001 
MaxEpisodes 10000 
ScoreAveragingWindowLength 50 
StopTraining Value 400 
PopulationSize 200 
MaxGenerations 25 


Table 1 illustrates the parameters of the Hybrid DDPG-ANFIS model. These parameters, aimed at preventing 
overfitting, include an L2 regularization of 0.0001, a gradient threshold of 1 for stable training, and a learning rate 
of 0.001 for balanced weight updates. The frequency of decision-making is influenced by the agent's interaction 
rate with the environment, set at 0.1 seconds. Moreover, robust and effective training, lasting a maximum of 
10,000 episodes, is achieved through an update rate of 0.001, a mini-batch size of 128 experiences, and a discount 
factor of 0.995 for long-term incentives. Training concludes after a 50-episode window, upon reaching an average 
reward of 400. 
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Figure 4. Environmental -Agent diagram 


Page 22 
© The Author(s), under exclusive license to KJCS 2024 


K Online ISSN: 

Print ISSN: 
Kasu Journal of Computer Science Vol.1 No.1 [March, 2024], pp. 16-27 www.kjcs.com.ng 
https: //doi.or: 


Figure 4 illustrated the environment, consisting of elements such as a reward, action, termination signal (isDone), 
and observation (state). The reward served to provide feedback on the agent's performance, while the observation, 
depicted as a 7x1 vector, captured the current state of the environment. The action, represented as a 2x1 vector, 
indicated the agent's decision-making process, while the isDone signal signaled the end of an episode. he agent 
operating within the DDPG algorithm pursued a navigation strategy aimed at effectively maximizing cumulative 
rewards within the environment by balancing exploration and exploitation. This navigation strategy relied on an 
iterative feedback loop where the agent: 


Observed: Gathered information about the environment using its sensors. 

Earned rewards: Received feedback from the environment based on its actions. 

Selected actions: Made decisions based on the observed state and its learned strategy. 

Responded to feedback: Adapted its actions in response to the "isDone" signal, indicating the end of an episode. 


This iterative cycle of observation, reward, action, and response enabled the agent to learn effectively and 
successfully navigate the environment. The primary challenge lay in striking a balance between exploration, 
which involves venturing into new areas to discover potential rewards, and exploitation, which focuses on 
maximizing rewards in known high-yield regions. The DDPG algorithm addressed this trade-off by incorporating 
exploration noise into the action selection process, thereby facilitating both the discovery and refinement of the 
navigation strategy. 
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Figure 5: Showing the Robot navigation at task 1,task 2 and task 3 


Figure 5 presented three tasks aimed at assessing the dynamic obstacle avoidance prowess of the across diverse 
environments: 


Task 1: The robot embarked from position (5m, 13m) with the objective of reaching the designated endpoint 
(23m, 13m) while skillfully evading the obstacle positioned at (13m, 5m). 


Task 2: Commencing from coordinates (5m, 21m), the robot navigated towards the designated destination (15m, 
5m), exercising caution to circumvent an obstruction located at (10m, 10m). 


Task 3: Starting at (21m, 21m), the robot was tasked with successfully reaching the target location (Sm, 5m) while 
adeptly navigating around the obstacle situated at (10m, 10m). 


The successful completion of these tasks provided valuable insights into the effectiveness of the implemented 
obstacle avoidance strategies. 


Page 23 
© The Author(s), under exclusive license to KJCS 2024 


K Online ISSN: 
Print ISSN: 
Kasu Journal of Computer Science Vol.1 No.1 [March, 2024], pp. 16-27 www.kjcs.com.ng 


https: //doi.or 
Binary Occupancy Grid 


25 


20 


= 
OV 


Y [meters] 


=à 
(s) 


o 
o 5 10 15 20 25 
X [meters] 


Figure 6: The Navigation Grid showing the Robot (red) at position 10m, 20m navigating the room. 
The Navigation Grid was shown in Figure 6, which showed the route taken by the robot (in red) from its starting 
point (10 m, 20 m) to the designated objective (0 m, 0 m) in a virtual environment. The robot's constantly updated 


course, which was based on its present coordinates and the intended destination, was displayed in the figure 6. 


The robot's successful navigation to the intended destination without running into any obstacle demonstrated how 
effective the proposed algorithm was at controlling its movement. This demonstrated the robot's capacity for 
independent navigation in dynamic situations, which was a significant finding of the study. 


Table 2 shows the Summary of the classification parameters obtained after simulations of the four models. 


Table 2: Results for each model 


Parameters hybrid DDPG- ANFIS ANFIS DDPG 
Efficiency 0.97012 96.158 0.96098 
Robustness 1 0.66667 1 
Collisions 0 0 0 

Safety Distance(m) 0.2 0.2 0.2 

Success Rate 0.978 0.923 0.952 

Completion Time(second) 5.154 3.6837 5.203 


Table 2 presented a comparison of the performance of three models—Hybrid DDPG-ANFIS, ANFIS, and 
DDPG—in mobile robot dynamic collision avoidance. The evaluation focused on three key metrics: success rate, 
robustness, and efficiency.Among the models assessed, Hybrid DDPG-ANFIS demonstrated the highest 
performance, achieving the highest efficiency (0.97012), robustness (1.0), and success rate (0.978). Notably, both 
Hybrid DDPG-ANFIS and DDPG exhibited excellent resilience, effectively avoiding collisions and maintaining a 
0.2-meter safety margin. Hybrid DDPG-ANFIS also outperformed ANFIS (3.6837 seconds) and DDPG (5.203 
seconds) in completion time, demonstrating competitive performance with 5.154 seconds. Based on the 
comprehensive analysis, the Hybrid DDPG-ANFIS model excelled across all criteria, showcasing superior 


performance in dynamic obstacle avoidance system. 


Page 24 
© The Author(s), under exclusive license to KJCS 2024 


K PRSS 
Kasu Journal of Computer Science Vol.1 No.1 [March, 2024], pp. 16-27 www.kjcs.com.ng 
https: //doi.or: 

4. Conclusion 

This paper investigated dynamic collision avoidance for mobile robots using a hybrid Deep Deterministic Policy 
Gradient (DDPG) and Adaptive Neuro-Fuzzy Inference System (ANFIS) algorithm. The proposed approach 
leveraged the strengths of both deep reinforcement learning and fuzzy logic, achieving exceptional performance 
in key metrics: efficiency (0.97012), robustness (1.0), safety distance maintenance (0.2 meters), success rate 
(0.978), and completion time (5.154 seconds). This translated to superior navigation capabilities in dynamic 
environments, enabling effective obstacle learning and detection. The synergy between DDPG and ANFIS 
contributed to enhanced accuracy and overall navigational prowess, demonstrating the significant potential of this 
hybrid approach for safer and more efficient mobile robot operations. The study underscored the significant 
potential of the hybrid approach for safer and more efficient mobile robot operations. This paved the way for 
future advancements in several key areas, including exploring the viability of applying the approach to larger and 
more complex environments, hyperparameter optimization for identifying the optimal parameter settings for 
maximizing performance across diverse scenarios, evaluating the ability of the trained model to adapt to unseen 
environments effectively, and ensuring the feasibility of real-time obstacle avoidance in practical applications by 
investigating the integration of human input and control into the navigation framework. These promising future 
research directions held exciting possibilities for the advancement of autonomous navigation and obstacle 
avoidance techniques in mobile robots. 
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